Overview

Dataset statistics

Number of variables24
Number of observations3670
Missing cells0
Missing cells (%)0.0%
Duplicate rows1273
Duplicate rows (%)34.7%
Total size in memory688.3 KiB
Average record size in memory192.0 B

Variable types

Text14
Categorical10

Alerts

Dataset has 1273 (34.7%) duplicate rowsDuplicates
X10 is highly overall correlated with X11 and 8 other fieldsHigh correlation
X11 is highly overall correlated with X10 and 6 other fieldsHigh correlation
X2 is highly overall correlated with X10 and 8 other fieldsHigh correlation
X3 is highly overall correlated with X10 and 8 other fieldsHigh correlation
X4 is highly overall correlated with X10 and 8 other fieldsHigh correlation
X6 is highly overall correlated with X10 and 7 other fieldsHigh correlation
X7 is highly overall correlated with X10 and 7 other fieldsHigh correlation
X8 is highly overall correlated with X10 and 8 other fieldsHigh correlation
X9 is highly overall correlated with X10 and 8 other fieldsHigh correlation
Y is highly overall correlated with X10 and 8 other fieldsHigh correlation
X4 is highly imbalanced (52.0%)Imbalance
Y is highly imbalanced (52.0%)Imbalance

Reproduction

Analysis started2023-12-10 20:30:52.285511
Analysis finished2023-12-10 20:30:54.985675
Duration2.7 seconds
Software versionydata-profiling vv4.6.3
Download configurationconfig.json

Variables

X1
Text

Distinct63
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:30:55.099622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length6
Mean length5.6168937
Min length5

Characters and Unicode

Total characters20614
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st rowLIMIT_BAL
2nd row20000
3rd row120000
4th row90000
5th row50000
ValueCountFrequency (%)
50000 453
 
12.3%
20000 236
 
6.4%
30000 191
 
5.2%
200000 182
 
5.0%
80000 165
 
4.5%
180000 135
 
3.7%
360000 122
 
3.3%
100000 118
 
3.2%
140000 117
 
3.2%
150000 115
 
3.1%
Other values (53) 1836
50.0%
2023-12-10T14:30:55.415735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 15154
73.5%
1 1189
 
5.8%
2 1174
 
5.7%
3 783
 
3.8%
5 743
 
3.6%
8 391
 
1.9%
6 385
 
1.9%
4 381
 
1.8%
7 213
 
1.0%
9 183
 
0.9%
Other values (7) 18
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20596
99.9%
Uppercase Letter 16
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 15154
73.6%
1 1189
 
5.8%
2 1174
 
5.7%
3 783
 
3.8%
5 743
 
3.6%
8 391
 
1.9%
6 385
 
1.9%
4 381
 
1.8%
7 213
 
1.0%
9 183
 
0.9%
Uppercase Letter
ValueCountFrequency (%)
L 4
25.0%
I 4
25.0%
M 2
12.5%
T 2
12.5%
B 2
12.5%
A 2
12.5%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 20598
99.9%
Latin 16
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 15154
73.6%
1 1189
 
5.8%
2 1174
 
5.7%
3 783
 
3.8%
5 743
 
3.6%
8 391
 
1.9%
6 385
 
1.9%
4 381
 
1.8%
7 213
 
1.0%
9 183
 
0.9%
Latin
ValueCountFrequency (%)
L 4
25.0%
I 4
25.0%
M 2
12.5%
T 2
12.5%
B 2
12.5%
A 2
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20614
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 15154
73.5%
1 1189
 
5.8%
2 1174
 
5.7%
3 783
 
3.8%
5 743
 
3.6%
8 391
 
1.9%
6 385
 
1.9%
4 381
 
1.8%
7 213
 
1.0%
9 183
 
0.9%
Other values (7) 18
 
0.1%

X2
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
female
2130 
male
1538 
SEX
 
2

Length

Max length6
Median length6
Mean length5.160218
Min length3

Characters and Unicode

Total characters18938
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSEX
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowfemale

Common Values

ValueCountFrequency (%)
female 2130
58.0%
male 1538
41.9%
SEX 2
 
0.1%

Length

2023-12-10T14:30:55.568877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T14:30:55.687651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
female 2130
58.0%
male 1538
41.9%
sex 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 5798
30.6%
m 3668
19.4%
a 3668
19.4%
l 3668
19.4%
f 2130
 
11.2%
S 2
 
< 0.1%
E 2
 
< 0.1%
X 2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18932
> 99.9%
Uppercase Letter 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 5798
30.6%
m 3668
19.4%
a 3668
19.4%
l 3668
19.4%
f 2130
 
11.3%
Uppercase Letter
ValueCountFrequency (%)
S 2
33.3%
E 2
33.3%
X 2
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 18938
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 5798
30.6%
m 3668
19.4%
a 3668
19.4%
l 3668
19.4%
f 2130
 
11.2%
S 2
 
< 0.1%
E 2
 
< 0.1%
X 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18938
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 5798
30.6%
m 3668
19.4%
a 3668
19.4%
l 3668
19.4%
f 2130
 
11.2%
S 2
 
< 0.1%
E 2
 
< 0.1%
X 2
 
< 0.1%

X3
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
university
1644 
graduate school
1401 
high school
596 
other
 
27
EDUCATION
 
2

Length

Max length15
Median length11
Mean length12.033787
Min length5

Characters and Unicode

Total characters44164
Distinct characters26
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEDUCATION
2nd rowuniversity
3rd rowuniversity
4th rowuniversity
5th rowuniversity

Common Values

ValueCountFrequency (%)
university 1644
44.8%
graduate school 1401
38.2%
high school 596
 
16.2%
other 27
 
0.7%
EDUCATION 2
 
0.1%

Length

2023-12-10T14:30:55.821346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T14:30:55.937307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
school 1997
35.2%
university 1644
29.0%
graduate 1401
24.7%
high 596
 
10.5%
other 27
 
0.5%
education 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
o 4021
 
9.1%
i 3884
 
8.8%
s 3641
 
8.2%
h 3216
 
7.3%
e 3072
 
7.0%
r 3072
 
7.0%
t 3072
 
7.0%
u 3045
 
6.9%
a 2802
 
6.3%
1997
 
4.5%
Other values (16) 12342
27.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 42149
95.4%
Space Separator 1997
 
4.5%
Uppercase Letter 18
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 4021
 
9.5%
i 3884
 
9.2%
s 3641
 
8.6%
h 3216
 
7.6%
e 3072
 
7.3%
r 3072
 
7.3%
t 3072
 
7.3%
u 3045
 
7.2%
a 2802
 
6.6%
l 1997
 
4.7%
Other values (6) 10327
24.5%
Uppercase Letter
ValueCountFrequency (%)
E 2
11.1%
D 2
11.1%
U 2
11.1%
C 2
11.1%
A 2
11.1%
T 2
11.1%
I 2
11.1%
O 2
11.1%
N 2
11.1%
Space Separator
ValueCountFrequency (%)
1997
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 42167
95.5%
Common 1997
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 4021
 
9.5%
i 3884
 
9.2%
s 3641
 
8.6%
h 3216
 
7.6%
e 3072
 
7.3%
r 3072
 
7.3%
t 3072
 
7.3%
u 3045
 
7.2%
a 2802
 
6.6%
l 1997
 
4.7%
Other values (15) 10345
24.5%
Common
ValueCountFrequency (%)
1997
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 44164
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 4021
 
9.1%
i 3884
 
8.8%
s 3641
 
8.2%
h 3216
 
7.3%
e 3072
 
7.0%
r 3072
 
7.0%
t 3072
 
7.0%
u 3045
 
6.9%
a 2802
 
6.3%
1997
 
4.5%
Other values (16) 12342
27.9%

X4
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2
2045 
1
1559 
3
 
54
0
 
10
MARRIAGE
 
2

Length

Max length8
Median length1
Mean length1.0038147
Min length1

Characters and Unicode

Total characters3684
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMARRIAGE
2nd row1
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
2 2045
55.7%
1 1559
42.5%
3 54
 
1.5%
0 10
 
0.3%
MARRIAGE 2
 
0.1%

Length

2023-12-10T14:30:56.066945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T14:30:56.195765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 2045
55.7%
1 1559
42.5%
3 54
 
1.5%
0 10
 
0.3%
marriage 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
2 2045
55.5%
1 1559
42.3%
3 54
 
1.5%
0 10
 
0.3%
A 4
 
0.1%
R 4
 
0.1%
M 2
 
0.1%
I 2
 
0.1%
G 2
 
0.1%
E 2
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3668
99.6%
Uppercase Letter 16
 
0.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 4
25.0%
R 4
25.0%
M 2
12.5%
I 2
12.5%
G 2
12.5%
E 2
12.5%
Decimal Number
ValueCountFrequency (%)
2 2045
55.8%
1 1559
42.5%
3 54
 
1.5%
0 10
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common 3668
99.6%
Latin 16
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 4
25.0%
R 4
25.0%
M 2
12.5%
I 2
12.5%
G 2
12.5%
E 2
12.5%
Common
ValueCountFrequency (%)
2 2045
55.8%
1 1559
42.5%
3 54
 
1.5%
0 10
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3684
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 2045
55.5%
1 1559
42.3%
3 54
 
1.5%
0 10
 
0.3%
A 4
 
0.1%
R 4
 
0.1%
M 2
 
0.1%
I 2
 
0.1%
G 2
 
0.1%
E 2
 
0.1%

X5
Text

Distinct53
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:30:56.317207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length2
Mean length2.000545
Min length2

Characters and Unicode

Total characters7342
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st rowAGE
2nd row24
3rd row26
4th row34
5th row37
ValueCountFrequency (%)
29 214
 
5.8%
27 185
 
5.0%
30 174
 
4.7%
26 158
 
4.3%
24 155
 
4.2%
32 152
 
4.1%
34 151
 
4.1%
28 147
 
4.0%
31 145
 
4.0%
35 135
 
3.7%
Other values (43) 2054
56.0%
2023-12-10T14:30:56.607069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 1760
24.0%
2 1584
21.6%
4 1159
15.8%
5 649
 
8.8%
6 426
 
5.8%
7 408
 
5.6%
9 387
 
5.3%
0 334
 
4.5%
8 332
 
4.5%
1 297
 
4.0%
Other values (3) 6
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7336
99.9%
Uppercase Letter 6
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 1760
24.0%
2 1584
21.6%
4 1159
15.8%
5 649
 
8.8%
6 426
 
5.8%
7 408
 
5.6%
9 387
 
5.3%
0 334
 
4.6%
8 332
 
4.5%
1 297
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
A 2
33.3%
G 2
33.3%
E 2
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 7336
99.9%
Latin 6
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 1760
24.0%
2 1584
21.6%
4 1159
15.8%
5 649
 
8.8%
6 426
 
5.8%
7 408
 
5.6%
9 387
 
5.3%
0 334
 
4.6%
8 332
 
4.5%
1 297
 
4.0%
Latin
ValueCountFrequency (%)
A 2
33.3%
G 2
33.3%
E 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7342
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 1760
24.0%
2 1584
21.6%
4 1159
15.8%
5 649
 
8.8%
6 426
 
5.8%
7 408
 
5.6%
9 387
 
5.3%
0 334
 
4.5%
8 332
 
4.5%
1 297
 
4.0%
Other values (3) 6
 
0.1%

X6
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
0
1741 
-1
786 
1
495 
2
327 
-2
275 
Other values (5)
 
46

Length

Max length5
Median length1
Mean length1.2912807
Min length1

Characters and Unicode

Total characters4739
Distinct characters12
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowPAY_0
2nd row2
3rd row-1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1741
47.4%
-1 786
21.4%
1 495
 
13.5%
2 327
 
8.9%
-2 275
 
7.5%
3 25
 
0.7%
4 9
 
0.2%
8 9
 
0.2%
PAY_0 2
 
0.1%
7 1
 
< 0.1%

Length

2023-12-10T14:30:56.765559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T14:30:56.919994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 1741
47.4%
1 1281
34.9%
2 602
 
16.4%
3 25
 
0.7%
4 9
 
0.2%
8 9
 
0.2%
pay_0 2
 
0.1%
7 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1743
36.8%
1 1281
27.0%
- 1061
22.4%
2 602
 
12.7%
3 25
 
0.5%
4 9
 
0.2%
8 9
 
0.2%
P 2
 
< 0.1%
A 2
 
< 0.1%
Y 2
 
< 0.1%
Other values (2) 3
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3670
77.4%
Dash Punctuation 1061
 
22.4%
Uppercase Letter 6
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1743
47.5%
1 1281
34.9%
2 602
 
16.4%
3 25
 
0.7%
4 9
 
0.2%
8 9
 
0.2%
7 1
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 1061
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4733
99.9%
Latin 6
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1743
36.8%
1 1281
27.1%
- 1061
22.4%
2 602
 
12.7%
3 25
 
0.5%
4 9
 
0.2%
8 9
 
0.2%
_ 2
 
< 0.1%
7 1
 
< 0.1%
Latin
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4739
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1743
36.8%
1 1281
27.0%
- 1061
22.4%
2 602
 
12.7%
3 25
 
0.5%
4 9
 
0.2%
8 9
 
0.2%
P 2
 
< 0.1%
A 2
 
< 0.1%
Y 2
 
< 0.1%
Other values (2) 3
 
0.1%

X7
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
0
1901 
-1
787 
2
487 
-2
442 
3
 
31
Other values (6)
 
22

Length

Max length5
Median length1
Mean length1.3370572
Min length1

Characters and Unicode

Total characters4907
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAY_2
2nd row2
3rd row2
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1901
51.8%
-1 787
21.4%
2 487
 
13.3%
-2 442
 
12.0%
3 31
 
0.8%
7 9
 
0.2%
4 4
 
0.1%
1 3
 
0.1%
PAY_2 2
 
0.1%
5 2
 
0.1%

Length

2023-12-10T14:30:57.086493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 1901
51.8%
2 929
25.3%
1 790
21.5%
3 31
 
0.8%
7 9
 
0.2%
4 4
 
0.1%
pay_2 2
 
0.1%
5 2
 
0.1%
6 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 1901
38.7%
- 1229
25.0%
2 931
19.0%
1 790
16.1%
3 31
 
0.6%
7 9
 
0.2%
4 4
 
0.1%
P 2
 
< 0.1%
A 2
 
< 0.1%
Y 2
 
< 0.1%
Other values (3) 6
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3670
74.8%
Dash Punctuation 1229
 
25.0%
Uppercase Letter 6
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1901
51.8%
2 931
25.4%
1 790
21.5%
3 31
 
0.8%
7 9
 
0.2%
4 4
 
0.1%
5 2
 
0.1%
6 2
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 1229
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4901
99.9%
Latin 6
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1901
38.8%
- 1229
25.1%
2 931
19.0%
1 790
16.1%
3 31
 
0.6%
7 9
 
0.2%
4 4
 
0.1%
_ 2
 
< 0.1%
5 2
 
< 0.1%
6 2
 
< 0.1%
Latin
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4907
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1901
38.7%
- 1229
25.0%
2 931
19.0%
1 790
16.1%
3 31
 
0.6%
7 9
 
0.2%
4 4
 
0.1%
P 2
 
< 0.1%
A 2
 
< 0.1%
Y 2
 
< 0.1%
Other values (3) 6
 
0.1%

X8
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
0
1875 
-1
797 
-2
475 
2
468 
4
 
14
Other values (6)
 
41

Length

Max length5
Median length1
Mean length1.3487738
Min length1

Characters and Unicode

Total characters4950
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAY_3
2nd row-1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1875
51.1%
-1 797
21.7%
-2 475
 
12.9%
2 468
 
12.8%
4 14
 
0.4%
3 11
 
0.3%
7 10
 
0.3%
6 9
 
0.2%
5 6
 
0.2%
1 3
 
0.1%

Length

2023-12-10T14:30:57.225768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 1875
51.1%
2 943
25.7%
1 800
21.8%
4 14
 
0.4%
3 11
 
0.3%
7 10
 
0.3%
6 9
 
0.2%
5 6
 
0.2%
pay_3 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 1875
37.9%
- 1272
25.7%
2 943
19.1%
1 800
16.2%
4 14
 
0.3%
3 13
 
0.3%
7 10
 
0.2%
6 9
 
0.2%
5 6
 
0.1%
P 2
 
< 0.1%
Other values (3) 6
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3670
74.1%
Dash Punctuation 1272
 
25.7%
Uppercase Letter 6
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1875
51.1%
2 943
25.7%
1 800
21.8%
4 14
 
0.4%
3 13
 
0.4%
7 10
 
0.3%
6 9
 
0.2%
5 6
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 1272
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4944
99.9%
Latin 6
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1875
37.9%
- 1272
25.7%
2 943
19.1%
1 800
16.2%
4 14
 
0.3%
3 13
 
0.3%
7 10
 
0.2%
6 9
 
0.2%
5 6
 
0.1%
_ 2
 
< 0.1%
Latin
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4950
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1875
37.9%
- 1272
25.7%
2 943
19.1%
1 800
16.2%
4 14
 
0.3%
3 13
 
0.3%
7 10
 
0.2%
6 9
 
0.2%
5 6
 
0.1%
P 2
 
< 0.1%
Other values (3) 6
 
0.1%

X9
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
0
1995 
-1
753 
-2
538 
2
322 
3
 
29
Other values (5)
 
33

Length

Max length5
Median length1
Mean length1.353951
Min length1

Characters and Unicode

Total characters4969
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowPAY_4
2nd row-1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1995
54.4%
-1 753
 
20.5%
-2 538
 
14.7%
2 322
 
8.8%
3 29
 
0.8%
5 12
 
0.3%
4 9
 
0.2%
7 9
 
0.2%
PAY_4 2
 
0.1%
6 1
 
< 0.1%

Length

2023-12-10T14:30:57.394846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T14:30:57.537508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 1995
54.4%
2 860
23.4%
1 753
 
20.5%
3 29
 
0.8%
5 12
 
0.3%
4 9
 
0.2%
7 9
 
0.2%
pay_4 2
 
0.1%
6 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1995
40.1%
- 1291
26.0%
2 860
17.3%
1 753
 
15.2%
3 29
 
0.6%
5 12
 
0.2%
4 11
 
0.2%
7 9
 
0.2%
P 2
 
< 0.1%
A 2
 
< 0.1%
Other values (3) 5
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3670
73.9%
Dash Punctuation 1291
 
26.0%
Uppercase Letter 6
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1995
54.4%
2 860
23.4%
1 753
 
20.5%
3 29
 
0.8%
5 12
 
0.3%
4 11
 
0.3%
7 9
 
0.2%
6 1
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 1291
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4963
99.9%
Latin 6
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1995
40.2%
- 1291
26.0%
2 860
17.3%
1 753
 
15.2%
3 29
 
0.6%
5 12
 
0.2%
4 11
 
0.2%
7 9
 
0.2%
_ 2
 
< 0.1%
6 1
 
< 0.1%
Latin
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4969
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1995
40.1%
- 1291
26.0%
2 860
17.3%
1 753
 
15.2%
3 29
 
0.6%
5 12
 
0.2%
4 11
 
0.2%
7 9
 
0.2%
P 2
 
< 0.1%
A 2
 
< 0.1%
Other values (3) 5
 
0.1%

X10
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
0
1996 
-1
749 
-2
547 
2
328 
3
 
18
Other values (4)
 
32

Length

Max length5
Median length1
Mean length1.3553134
Min length1

Characters and Unicode

Total characters4974
Distinct characters12
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAY_5
2nd row-2
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1996
54.4%
-1 749
 
20.4%
-2 547
 
14.9%
2 328
 
8.9%
3 18
 
0.5%
4 18
 
0.5%
7 10
 
0.3%
PAY_5 2
 
0.1%
5 2
 
0.1%

Length

2023-12-10T14:30:57.735724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T14:30:57.875782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 1996
54.4%
2 875
23.8%
1 749
 
20.4%
3 18
 
0.5%
4 18
 
0.5%
7 10
 
0.3%
pay_5 2
 
0.1%
5 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 1996
40.1%
- 1296
26.1%
2 875
17.6%
1 749
 
15.1%
3 18
 
0.4%
4 18
 
0.4%
7 10
 
0.2%
5 4
 
0.1%
P 2
 
< 0.1%
A 2
 
< 0.1%
Other values (2) 4
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3670
73.8%
Dash Punctuation 1296
 
26.1%
Uppercase Letter 6
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1996
54.4%
2 875
23.8%
1 749
 
20.4%
3 18
 
0.5%
4 18
 
0.5%
7 10
 
0.3%
5 4
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 1296
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4968
99.9%
Latin 6
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1996
40.2%
- 1296
26.1%
2 875
17.6%
1 749
 
15.1%
3 18
 
0.4%
4 18
 
0.4%
7 10
 
0.2%
5 4
 
0.1%
_ 2
 
< 0.1%
Latin
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4974
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1996
40.1%
- 1296
26.1%
2 875
17.6%
1 749
 
15.1%
3 18
 
0.4%
4 18
 
0.4%
7 10
 
0.2%
5 4
 
0.1%
P 2
 
< 0.1%
A 2
 
< 0.1%
Other values (2) 4
 
0.1%

X11
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
0
1879 
-1
810 
-2
583 
2
347 
3
 
30
Other values (5)
 
21

Length

Max length5
Median length1
Mean length1.3817439
Min length1

Characters and Unicode

Total characters5071
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowPAY_6
2nd row-2
3rd row2
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 1879
51.2%
-1 810
22.1%
-2 583
 
15.9%
2 347
 
9.5%
3 30
 
0.8%
6 8
 
0.2%
7 6
 
0.2%
4 4
 
0.1%
PAY_6 2
 
0.1%
8 1
 
< 0.1%

Length

2023-12-10T14:30:58.025623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T14:30:58.165471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 1879
51.2%
2 930
25.3%
1 810
22.1%
3 30
 
0.8%
6 8
 
0.2%
7 6
 
0.2%
4 4
 
0.1%
pay_6 2
 
0.1%
8 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1879
37.1%
- 1393
27.5%
2 930
18.3%
1 810
16.0%
3 30
 
0.6%
6 10
 
0.2%
7 6
 
0.1%
4 4
 
0.1%
P 2
 
< 0.1%
A 2
 
< 0.1%
Other values (3) 5
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3670
72.4%
Dash Punctuation 1393
 
27.5%
Uppercase Letter 6
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1879
51.2%
2 930
25.3%
1 810
22.1%
3 30
 
0.8%
6 10
 
0.3%
7 6
 
0.2%
4 4
 
0.1%
8 1
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 1393
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5065
99.9%
Latin 6
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1879
37.1%
- 1393
27.5%
2 930
18.4%
1 810
16.0%
3 30
 
0.6%
6 10
 
0.2%
7 6
 
0.1%
4 4
 
0.1%
_ 2
 
< 0.1%
8 1
 
< 0.1%
Latin
ValueCountFrequency (%)
P 2
33.3%
A 2
33.3%
Y 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5071
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1879
37.1%
- 1393
27.5%
2 930
18.3%
1 810
16.0%
3 30
 
0.6%
6 10
 
0.2%
7 6
 
0.1%
4 4
 
0.1%
P 2
 
< 0.1%
A 2
 
< 0.1%
Other values (3) 5
 
0.1%

X12
Text

Distinct2138
Distinct (%)58.3%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:30:58.405623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length6
Mean length4.5122616
Min length1

Characters and Unicode

Total characters16560
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique980 ?
Unique (%)26.7%

Sample

1st rowBILL_AMT1
2nd row3913
3rd row2682
4th row29239
5th row46990
ValueCountFrequency (%)
0 244
 
6.6%
390 31
 
0.8%
780 12
 
0.3%
316 11
 
0.3%
396 10
 
0.3%
200 8
 
0.2%
2400 7
 
0.2%
291 7
 
0.2%
2000 7
 
0.2%
819 6
 
0.2%
Other values (2120) 3327
90.7%
2023-12-10T14:30:58.817492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2375
14.3%
0 1894
11.4%
2 1794
10.8%
4 1678
10.1%
3 1595
9.6%
5 1454
8.8%
6 1449
8.8%
8 1414
8.5%
9 1406
8.5%
7 1402
8.5%
Other values (8) 99
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16461
99.4%
Dash Punctuation 83
 
0.5%
Uppercase Letter 14
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2375
14.4%
0 1894
11.5%
2 1794
10.9%
4 1678
10.2%
3 1595
9.7%
5 1454
8.8%
6 1449
8.8%
8 1414
8.6%
9 1406
8.5%
7 1402
8.5%
Uppercase Letter
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%
Dash Punctuation
ValueCountFrequency (%)
- 83
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16546
99.9%
Latin 14
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2375
14.4%
0 1894
11.4%
2 1794
10.8%
4 1678
10.1%
3 1595
9.6%
5 1454
8.8%
6 1449
8.8%
8 1414
8.5%
9 1406
8.5%
7 1402
8.5%
Other values (2) 85
 
0.5%
Latin
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16560
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2375
14.3%
0 1894
11.4%
2 1794
10.8%
4 1678
10.1%
3 1595
9.6%
5 1454
8.8%
6 1449
8.8%
8 1414
8.5%
9 1406
8.5%
7 1402
8.5%
Other values (8) 99
 
0.6%

X13
Text

Distinct2092
Distinct (%)57.0%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:30:59.070077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length6
Mean length4.439782
Min length1

Characters and Unicode

Total characters16294
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique971 ?
Unique (%)26.5%

Sample

1st rowBILL_AMT2
2nd row3102
3rd row1725
4th row14027
5th row48233
ValueCountFrequency (%)
0 328
 
8.9%
390 17
 
0.5%
200 14
 
0.4%
316 13
 
0.4%
291 10
 
0.3%
780 9
 
0.2%
326 8
 
0.2%
300 8
 
0.2%
396 7
 
0.2%
2400 7
 
0.2%
Other values (2078) 3249
88.5%
2023-12-10T14:30:59.457285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2260
13.9%
0 1965
12.1%
2 1853
11.4%
3 1610
9.9%
5 1483
9.1%
6 1469
9.0%
4 1461
9.0%
9 1365
8.4%
7 1364
8.4%
8 1356
8.3%
Other values (8) 108
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16186
99.3%
Dash Punctuation 92
 
0.6%
Uppercase Letter 14
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2260
14.0%
0 1965
12.1%
2 1853
11.4%
3 1610
9.9%
5 1483
9.2%
6 1469
9.1%
4 1461
9.0%
9 1365
8.4%
7 1364
8.4%
8 1356
8.4%
Uppercase Letter
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%
Dash Punctuation
ValueCountFrequency (%)
- 92
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16280
99.9%
Latin 14
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2260
13.9%
0 1965
12.1%
2 1853
11.4%
3 1610
9.9%
5 1483
9.1%
6 1469
9.0%
4 1461
9.0%
9 1365
8.4%
7 1364
8.4%
8 1356
8.3%
Other values (2) 94
 
0.6%
Latin
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16294
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2260
13.9%
0 1965
12.1%
2 1853
11.4%
3 1610
9.9%
5 1483
9.1%
6 1469
9.0%
4 1461
9.0%
9 1365
8.4%
7 1364
8.4%
8 1356
8.3%
Other values (8) 108
 
0.7%

X14
Text

Distinct2045
Distinct (%)55.7%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:30:59.710093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length6
Mean length4.3613079
Min length1

Characters and Unicode

Total characters16006
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique939 ?
Unique (%)25.6%

Sample

1st rowBILL_AMT3
2nd row689
3rd row2682
4th row13559
5th row49291
ValueCountFrequency (%)
0 384
 
10.5%
390 31
 
0.8%
200 12
 
0.3%
780 11
 
0.3%
2 9
 
0.2%
291 9
 
0.2%
316 8
 
0.2%
396 7
 
0.2%
2400 7
 
0.2%
326 6
 
0.2%
Other values (2031) 3186
86.8%
2023-12-10T14:31:00.097705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2274
14.2%
0 1982
12.4%
2 1881
11.8%
3 1610
10.1%
4 1415
8.8%
6 1394
8.7%
9 1379
8.6%
5 1359
8.5%
8 1350
8.4%
7 1264
7.9%
Other values (8) 98
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15908
99.4%
Dash Punctuation 82
 
0.5%
Uppercase Letter 14
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2274
14.3%
0 1982
12.5%
2 1881
11.8%
3 1610
10.1%
4 1415
8.9%
6 1394
8.8%
9 1379
8.7%
5 1359
8.5%
8 1350
8.5%
7 1264
7.9%
Uppercase Letter
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%
Dash Punctuation
ValueCountFrequency (%)
- 82
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15992
99.9%
Latin 14
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2274
14.2%
0 1982
12.4%
2 1881
11.8%
3 1610
10.1%
4 1415
8.8%
6 1394
8.7%
9 1379
8.6%
5 1359
8.5%
8 1350
8.4%
7 1264
7.9%
Other values (2) 84
 
0.5%
Latin
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16006
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2274
14.2%
0 1982
12.4%
2 1881
11.8%
3 1610
10.1%
4 1415
8.8%
6 1394
8.7%
9 1379
8.6%
5 1359
8.5%
8 1350
8.4%
7 1264
7.9%
Other values (8) 98
 
0.6%

X15
Text

Distinct2009
Distinct (%)54.7%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:00.325847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length6
Mean length4.2942779
Min length1

Characters and Unicode

Total characters15760
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique916 ?
Unique (%)25.0%

Sample

1st rowBILL_AMT4
2nd row0
3rd row3272
4th row14331
5th row28314
ValueCountFrequency (%)
0 424
 
11.6%
390 25
 
0.7%
316 15
 
0.4%
291 10
 
0.3%
326 9
 
0.2%
300 8
 
0.2%
150 8
 
0.2%
2400 7
 
0.2%
416 7
 
0.2%
780 7
 
0.2%
Other values (1993) 3150
85.8%
2023-12-10T14:31:00.709685image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2295
14.6%
0 1960
12.4%
2 1731
11.0%
3 1514
9.6%
4 1444
9.2%
6 1388
8.8%
8 1370
8.7%
5 1353
8.6%
9 1326
8.4%
7 1279
8.1%
Other values (8) 100
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15660
99.4%
Dash Punctuation 84
 
0.5%
Uppercase Letter 14
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2295
14.7%
0 1960
12.5%
2 1731
11.1%
3 1514
9.7%
4 1444
9.2%
6 1388
8.9%
8 1370
8.7%
5 1353
8.6%
9 1326
8.5%
7 1279
8.2%
Uppercase Letter
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%
Dash Punctuation
ValueCountFrequency (%)
- 84
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15746
99.9%
Latin 14
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2295
14.6%
0 1960
12.4%
2 1731
11.0%
3 1514
9.6%
4 1444
9.2%
6 1388
8.8%
8 1370
8.7%
5 1353
8.6%
9 1326
8.4%
7 1279
8.1%
Other values (2) 86
 
0.5%
Latin
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15760
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2295
14.6%
0 1960
12.4%
2 1731
11.0%
3 1514
9.6%
4 1444
9.2%
6 1388
8.8%
8 1370
8.7%
5 1353
8.6%
9 1326
8.4%
7 1279
8.1%
Other values (8) 100
 
0.6%

X16
Text

Distinct1984
Distinct (%)54.1%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:00.969331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length6
Mean length4.2564033
Min length1

Characters and Unicode

Total characters15621
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique915 ?
Unique (%)24.9%

Sample

1st rowBILL_AMT5
2nd row0
3rd row3455
4th row14948
5th row28959
ValueCountFrequency (%)
0 460
 
12.5%
390 28
 
0.8%
150 13
 
0.4%
396 11
 
0.3%
316 11
 
0.3%
2000 8
 
0.2%
780 7
 
0.2%
416 7
 
0.2%
2400 7
 
0.2%
1261 6
 
0.2%
Other values (1967) 3112
84.8%
2023-12-10T14:31:01.385575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2224
14.2%
0 2058
13.2%
2 1660
10.6%
3 1562
10.0%
9 1464
9.4%
5 1366
8.7%
4 1324
8.5%
6 1320
8.5%
8 1301
8.3%
7 1239
7.9%
Other values (8) 103
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15518
99.3%
Dash Punctuation 87
 
0.6%
Uppercase Letter 14
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2224
14.3%
0 2058
13.3%
2 1660
10.7%
3 1562
10.1%
9 1464
9.4%
5 1366
8.8%
4 1324
8.5%
6 1320
8.5%
8 1301
8.4%
7 1239
8.0%
Uppercase Letter
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%
Dash Punctuation
ValueCountFrequency (%)
- 87
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15607
99.9%
Latin 14
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2224
14.3%
0 2058
13.2%
2 1660
10.6%
3 1562
10.0%
9 1464
9.4%
5 1366
8.8%
4 1324
8.5%
6 1320
8.5%
8 1301
8.3%
7 1239
7.9%
Other values (2) 89
 
0.6%
Latin
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15621
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2224
14.2%
0 2058
13.2%
2 1660
10.6%
3 1562
10.0%
9 1464
9.4%
5 1366
8.7%
4 1324
8.5%
6 1320
8.5%
8 1301
8.3%
7 1239
7.9%
Other values (8) 103
 
0.7%

X17
Text

Distinct1948
Distinct (%)53.1%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:01.635410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length7
Mean length4.1618529
Min length1

Characters and Unicode

Total characters15274
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique903 ?
Unique (%)24.6%

Sample

1st rowBILL_AMT6
2nd row0
3rd row3261
4th row15549
5th row29547
ValueCountFrequency (%)
0 532
 
14.5%
390 22
 
0.6%
780 18
 
0.5%
150 17
 
0.5%
316 12
 
0.3%
326 11
 
0.3%
291 11
 
0.3%
200 8
 
0.2%
396 7
 
0.2%
1320 6
 
0.2%
Other values (1932) 3026
82.5%
2023-12-10T14:31:02.215515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2226
14.6%
0 1981
13.0%
2 1593
10.4%
3 1555
10.2%
9 1385
9.1%
4 1365
8.9%
5 1354
8.9%
6 1305
8.5%
8 1292
8.5%
7 1131
7.4%
Other values (8) 87
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15187
99.4%
Dash Punctuation 71
 
0.5%
Uppercase Letter 14
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2226
14.7%
0 1981
13.0%
2 1593
10.5%
3 1555
10.2%
9 1385
9.1%
4 1365
9.0%
5 1354
8.9%
6 1305
8.6%
8 1292
8.5%
7 1131
7.4%
Uppercase Letter
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%
Dash Punctuation
ValueCountFrequency (%)
- 71
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15260
99.9%
Latin 14
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2226
14.6%
0 1981
13.0%
2 1593
10.4%
3 1555
10.2%
9 1385
9.1%
4 1365
8.9%
5 1354
8.9%
6 1305
8.6%
8 1292
8.5%
7 1131
7.4%
Other values (2) 73
 
0.5%
Latin
ValueCountFrequency (%)
L 4
28.6%
B 2
14.3%
I 2
14.3%
A 2
14.3%
M 2
14.3%
T 2
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15274
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2226
14.6%
0 1981
13.0%
2 1593
10.4%
3 1555
10.2%
9 1385
9.1%
4 1365
8.9%
5 1354
8.9%
6 1305
8.5%
8 1292
8.5%
7 1131
7.4%
Other values (8) 87
 
0.6%

X18
Text

Distinct1147
Distinct (%)31.3%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:02.415725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length3.5160763
Min length1

Characters and Unicode

Total characters12904
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique478 ?
Unique (%)13.0%

Sample

1st rowPAY_AMT1
2nd row0
3rd row0
4th row1518
5th row2000
ValueCountFrequency (%)
0 667
 
18.2%
2000 147
 
4.0%
3000 102
 
2.8%
5000 76
 
2.1%
10000 64
 
1.7%
1000 62
 
1.7%
2500 60
 
1.6%
1500 55
 
1.5%
4000 47
 
1.3%
1600 30
 
0.8%
Other values (1137) 2360
64.3%
2023-12-10T14:31:02.759705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4789
37.1%
1 1670
 
12.9%
2 1266
 
9.8%
3 1022
 
7.9%
5 955
 
7.4%
4 769
 
6.0%
6 758
 
5.9%
7 626
 
4.9%
8 580
 
4.5%
9 455
 
3.5%
Other values (6) 14
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12890
99.9%
Uppercase Letter 12
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4789
37.2%
1 1670
 
13.0%
2 1266
 
9.8%
3 1022
 
7.9%
5 955
 
7.4%
4 769
 
6.0%
6 758
 
5.9%
7 626
 
4.9%
8 580
 
4.5%
9 455
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12892
99.9%
Latin 12
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4789
37.1%
1 1670
 
13.0%
2 1266
 
9.8%
3 1022
 
7.9%
5 955
 
7.4%
4 769
 
6.0%
6 758
 
5.9%
7 626
 
4.9%
8 580
 
4.5%
9 455
 
3.5%
Latin
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12904
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4789
37.1%
1 1670
 
12.9%
2 1266
 
9.8%
3 1022
 
7.9%
5 955
 
7.4%
4 769
 
6.0%
6 758
 
5.9%
7 626
 
4.9%
8 580
 
4.5%
9 455
 
3.5%
Other values (6) 14
 
0.1%

X19
Text

Distinct1130
Distinct (%)30.8%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:02.965327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length3.4351499
Min length1

Characters and Unicode

Total characters12607
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique468 ?
Unique (%)12.8%

Sample

1st rowPAY_AMT2
2nd row689
3rd row1000
4th row1500
5th row2019
ValueCountFrequency (%)
0 708
 
19.3%
2000 146
 
4.0%
1000 95
 
2.6%
5000 95
 
2.6%
3000 94
 
2.6%
1500 89
 
2.4%
1200 39
 
1.1%
4000 38
 
1.0%
1400 34
 
0.9%
390 34
 
0.9%
Other values (1120) 2298
62.6%
2023-12-10T14:31:03.305841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4635
36.8%
1 1730
 
13.7%
2 1224
 
9.7%
3 1009
 
8.0%
5 968
 
7.7%
4 712
 
5.6%
6 640
 
5.1%
7 591
 
4.7%
9 567
 
4.5%
8 517
 
4.1%
Other values (6) 14
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12593
99.9%
Uppercase Letter 12
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4635
36.8%
1 1730
 
13.7%
2 1224
 
9.7%
3 1009
 
8.0%
5 968
 
7.7%
4 712
 
5.7%
6 640
 
5.1%
7 591
 
4.7%
9 567
 
4.5%
8 517
 
4.1%
Uppercase Letter
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12595
99.9%
Latin 12
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4635
36.8%
1 1730
 
13.7%
2 1224
 
9.7%
3 1009
 
8.0%
5 968
 
7.7%
4 712
 
5.7%
6 640
 
5.1%
7 591
 
4.7%
9 567
 
4.5%
8 517
 
4.1%
Latin
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12607
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4635
36.8%
1 1730
 
13.7%
2 1224
 
9.7%
3 1009
 
8.0%
5 968
 
7.7%
4 712
 
5.6%
6 640
 
5.1%
7 591
 
4.7%
9 567
 
4.5%
8 517
 
4.1%
Other values (6) 14
 
0.1%

X20
Text

Distinct1041
Distinct (%)28.4%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:03.525531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length3.2615804
Min length1

Characters and Unicode

Total characters11970
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique395 ?
Unique (%)10.8%

Sample

1st rowPAY_AMT3
2nd row0
3rd row1000
4th row1000
5th row1200
ValueCountFrequency (%)
0 798
 
21.7%
1000 172
 
4.7%
2000 152
 
4.1%
3000 100
 
2.7%
5000 82
 
2.2%
1500 55
 
1.5%
4000 48
 
1.3%
10000 44
 
1.2%
2500 29
 
0.8%
6000 29
 
0.8%
Other values (1031) 2161
58.9%
2023-12-10T14:31:03.870041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4835
40.4%
1 1492
 
12.5%
2 1036
 
8.7%
3 888
 
7.4%
5 883
 
7.4%
6 694
 
5.8%
4 596
 
5.0%
7 538
 
4.5%
8 516
 
4.3%
9 478
 
4.0%
Other values (6) 14
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 11956
99.9%
Uppercase Letter 12
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4835
40.4%
1 1492
 
12.5%
2 1036
 
8.7%
3 888
 
7.4%
5 883
 
7.4%
6 694
 
5.8%
4 596
 
5.0%
7 538
 
4.5%
8 516
 
4.3%
9 478
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11958
99.9%
Latin 12
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4835
40.4%
1 1492
 
12.5%
2 1036
 
8.7%
3 888
 
7.4%
5 883
 
7.4%
6 694
 
5.8%
4 596
 
5.0%
7 538
 
4.5%
8 516
 
4.3%
9 478
 
4.0%
Latin
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4835
40.4%
1 1492
 
12.5%
2 1036
 
8.7%
3 888
 
7.4%
5 883
 
7.4%
6 694
 
5.8%
4 596
 
5.0%
7 538
 
4.5%
8 516
 
4.3%
9 478
 
4.0%
Other values (6) 14
 
0.1%

X21
Text

Distinct1035
Distinct (%)28.2%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:04.075641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length3.2569482
Min length1

Characters and Unicode

Total characters11953
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique416 ?
Unique (%)11.3%

Sample

1st rowPAY_AMT4
2nd row0
3rd row1000
4th row1000
5th row1100
ValueCountFrequency (%)
0 808
 
22.0%
1000 160
 
4.4%
2000 142
 
3.9%
3000 94
 
2.6%
5000 91
 
2.5%
1500 71
 
1.9%
4000 49
 
1.3%
500 41
 
1.1%
2500 37
 
1.0%
10000 36
 
1.0%
Other values (1025) 2141
58.3%
2023-12-10T14:31:04.425580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4848
40.6%
1 1406
 
11.8%
2 994
 
8.3%
5 934
 
7.8%
3 857
 
7.2%
4 631
 
5.3%
6 631
 
5.3%
7 567
 
4.7%
9 554
 
4.6%
8 517
 
4.3%
Other values (6) 14
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 11939
99.9%
Uppercase Letter 12
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4848
40.6%
1 1406
 
11.8%
2 994
 
8.3%
5 934
 
7.8%
3 857
 
7.2%
4 631
 
5.3%
6 631
 
5.3%
7 567
 
4.7%
9 554
 
4.6%
8 517
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11941
99.9%
Latin 12
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4848
40.6%
1 1406
 
11.8%
2 994
 
8.3%
5 934
 
7.8%
3 857
 
7.2%
4 631
 
5.3%
6 631
 
5.3%
7 567
 
4.7%
9 554
 
4.6%
8 517
 
4.3%
Latin
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11953
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4848
40.6%
1 1406
 
11.8%
2 994
 
8.3%
5 934
 
7.8%
3 857
 
7.2%
4 631
 
5.3%
6 631
 
5.3%
7 567
 
4.7%
9 554
 
4.6%
8 517
 
4.3%
Other values (6) 14
 
0.1%

X22
Text

Distinct1039
Distinct (%)28.3%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:04.664890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length3.2517711
Min length1

Characters and Unicode

Total characters11934
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique420 ?
Unique (%)11.4%

Sample

1st rowPAY_AMT5
2nd row0
3rd row0
4th row1000
5th row1069
ValueCountFrequency (%)
0 827
 
22.5%
1000 163
 
4.4%
2000 123
 
3.4%
3000 119
 
3.2%
5000 77
 
2.1%
1500 73
 
2.0%
4000 48
 
1.3%
2500 33
 
0.9%
500 28
 
0.8%
3500 27
 
0.7%
Other values (1029) 2152
58.6%
2023-12-10T14:31:05.094725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4740
39.7%
1 1406
 
11.8%
2 1003
 
8.4%
3 938
 
7.9%
5 922
 
7.7%
4 674
 
5.6%
6 637
 
5.3%
7 557
 
4.7%
8 526
 
4.4%
9 517
 
4.3%
Other values (6) 14
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 11920
99.9%
Uppercase Letter 12
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4740
39.8%
1 1406
 
11.8%
2 1003
 
8.4%
3 938
 
7.9%
5 922
 
7.7%
4 674
 
5.7%
6 637
 
5.3%
7 557
 
4.7%
8 526
 
4.4%
9 517
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11922
99.9%
Latin 12
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4740
39.8%
1 1406
 
11.8%
2 1003
 
8.4%
3 938
 
7.9%
5 922
 
7.7%
4 674
 
5.7%
6 637
 
5.3%
7 557
 
4.7%
8 526
 
4.4%
9 517
 
4.3%
Latin
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11934
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4740
39.7%
1 1406
 
11.8%
2 1003
 
8.4%
3 938
 
7.9%
5 922
 
7.7%
4 674
 
5.6%
6 637
 
5.3%
7 557
 
4.7%
8 526
 
4.4%
9 517
 
4.3%
Other values (6) 14
 
0.1%

X23
Text

Distinct971
Distinct (%)26.5%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
2023-12-10T14:31:05.425801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length3.1525886
Min length1

Characters and Unicode

Total characters11570
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique400 ?
Unique (%)10.9%

Sample

1st rowPAY_AMT6
2nd row0
3rd row2000
4th row5000
5th row1000
ValueCountFrequency (%)
0 949
25.9%
1000 176
 
4.8%
2000 163
 
4.4%
5000 96
 
2.6%
3000 94
 
2.6%
1500 62
 
1.7%
4000 57
 
1.6%
10000 39
 
1.1%
2500 37
 
1.0%
6000 28
 
0.8%
Other values (961) 1969
53.7%
2023-12-10T14:31:05.995550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4906
42.4%
1 1279
 
11.1%
2 988
 
8.5%
5 867
 
7.5%
3 793
 
6.9%
6 622
 
5.4%
4 621
 
5.4%
7 533
 
4.6%
8 474
 
4.1%
9 473
 
4.1%
Other values (6) 14
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 11556
99.9%
Uppercase Letter 12
 
0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4906
42.5%
1 1279
 
11.1%
2 988
 
8.5%
5 867
 
7.5%
3 793
 
6.9%
6 622
 
5.4%
4 621
 
5.4%
7 533
 
4.6%
8 474
 
4.1%
9 473
 
4.1%
Uppercase Letter
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11558
99.9%
Latin 12
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4906
42.4%
1 1279
 
11.1%
2 988
 
8.5%
5 867
 
7.5%
3 793
 
6.9%
6 622
 
5.4%
4 621
 
5.4%
7 533
 
4.6%
8 474
 
4.1%
9 473
 
4.1%
Latin
ValueCountFrequency (%)
A 4
33.3%
P 2
16.7%
Y 2
16.7%
M 2
16.7%
T 2
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11570
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4906
42.4%
1 1279
 
11.1%
2 988
 
8.5%
5 867
 
7.5%
3 793
 
6.9%
6 622
 
5.4%
4 621
 
5.4%
7 533
 
4.6%
8 474
 
4.1%
9 473
 
4.1%
Other values (6) 14
 
0.1%

Y
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size28.8 KiB
not default
2873 
default
795 
default payment next month
 
2

Length

Max length26
Median length11
Mean length10.141689
Min length7

Characters and Unicode

Total characters37220
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowdefault payment next month
2nd rowdefault
3rd rowdefault
4th rownot default
5th rownot default

Common Values

ValueCountFrequency (%)
not default 2873
78.3%
default 795
 
21.7%
default payment next month 2
 
0.1%

Length

2023-12-10T14:31:06.157566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T14:31:06.279612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
default 3670
56.0%
not 2873
43.9%
payment 2
 
< 0.1%
next 2
 
< 0.1%
month 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
t 6549
17.6%
e 3674
9.9%
a 3672
9.9%
d 3670
9.9%
f 3670
9.9%
u 3670
9.9%
l 3670
9.9%
n 2879
7.7%
2879
7.7%
o 2875
7.7%
Other values (5) 12
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 34341
92.3%
Space Separator 2879
 
7.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 6549
19.1%
e 3674
10.7%
a 3672
10.7%
d 3670
10.7%
f 3670
10.7%
u 3670
10.7%
l 3670
10.7%
n 2879
8.4%
o 2875
8.4%
m 4
 
< 0.1%
Other values (4) 8
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2879
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 34341
92.3%
Common 2879
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 6549
19.1%
e 3674
10.7%
a 3672
10.7%
d 3670
10.7%
f 3670
10.7%
u 3670
10.7%
l 3670
10.7%
n 2879
8.4%
o 2875
8.4%
m 4
 
< 0.1%
Other values (4) 8
 
< 0.1%
Common
ValueCountFrequency (%)
2879
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 37220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 6549
17.6%
e 3674
9.9%
a 3672
9.9%
d 3670
9.9%
f 3670
9.9%
u 3670
9.9%
l 3670
9.9%
n 2879
7.7%
2879
7.7%
o 2875
7.7%
Other values (5) 12
 
< 0.1%

Correlations

2023-12-10T14:31:06.377235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
X10X11X2X3X4X6X7X8X9Y
X101.0000.7200.7080.5060.5010.5380.5820.7010.7700.729
X110.7201.0000.7080.5080.5000.4760.4970.5990.7090.718
X20.7080.7081.0000.7080.7080.7070.7080.7080.7080.707
X30.5060.5080.7081.0000.5120.5080.5110.5100.5080.708
X40.5010.5000.7080.5121.0000.5010.5010.5010.5010.707
X60.5380.4760.7070.5080.5011.0000.7100.6080.5550.755
X70.5820.4970.7080.5110.5010.7101.0000.7240.6140.733
X80.7010.5990.7080.5100.5010.6080.7241.0000.7520.730
X90.7700.7090.7080.5080.5010.5550.6140.7521.0000.724
Y0.7290.7180.7070.7080.7070.7550.7330.7300.7241.000

Missing values

2023-12-10T14:30:54.460122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T14:30:54.820064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

X1X2X3X4X5X6X7X8X9X10X11X12X13X14X15X16X17X18X19X20X21X22X23Y
0LIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default payment next month
120000femaleuniversity12422-1-1-2-23913310268900006890000default
2120000femaleuniversity226-120002268217252682327234553261010001000100002000default
390000femaleuniversity234000000292391402713559143311494815549151815001000100010005000not default
450000femaleuniversity137000000469904823349291283142895929547200020191200110010691000not default
550000maleuniversity157-10-10008617567035835209401914619131200036681100009000689679not default
650000malegraduate school2370000006440057069576081939419619200242500181565710001000800not default
7500000malegraduate school229000000367965412023445007542653483003473944550004000038000202391375013770not default
8100000femaleuniversity2230-1-100-111876380601221-159567380601058116871542not default
9140000femalehigh school1280020001128514096121081221111793371933290432100010001000not default
X1X2X3X4X5X6X7X8X9X10X11X12X13X14X15X16X17X18X19X20X21X22X23Y
3660380000maleuniversity150000000385662294826220022154283352703322701202090096109300033200012000default
366150000maleuniversity144000000453354602730286262752682327371152414279419729921000not default
3662150000femalehigh school143-1-120-1-1264948632316316141410000031614140default
3663220000maleuniversity229000000122286122839123035114385115903118528500850076007500047005503not default
366480000femaleother227000000452684714047411484434947843264260018001700170017001300not default
3665220000femaleuniversity1320000001949611975362032512083552130152174757200900010000800080108500not default
366670000femaleuniversity234122200242082501527189264562836131873150029000250040000not default
3667120000maleuniversity237-12000216241166801769517901196081914310001600800200001600default
3668180000femaleuniversity2320000002073017107358843105729052259331582300001000100010001000not default
366950000femalehigh school157000000490175069047487483194844949656250020002000174620001800not default

Duplicate rows

Most frequently occurring

X1X2X3X4X5X6X7X8X9X10X11X12X13X14X15X16X17X18X19X20X21X22X23Y# duplicates
010000femalehigh school2220000-2-2810997788259000200010360000default2
110000femaleuniversity1310000001591590509901997597368703233022001000333311322not default2
210000femaleuniversity222120000102508558105251005099039984021263903284761287not default2
310000malehigh school223000002697478389002918297299411113412984788470175not default2
410000malehigh school2350000007877891898649673941491561174112031031610002000not default2
510000maleuniversity1321222228425814894819180100521009101632010223500not default2
610000maleuniversity1450002007139841698159508975410192140017000400600200default2
710000maleuniversity15622200020974193397840624196432623000150200200160default2
810000maleuniversity2220000001877318460033576367044511500292710003001000500not default2
910000maleuniversity22200000079609649851886289293503320001000500150002500default2